在隐私机器学习中,很常见的是,学识渊博的模型的所有者没有对数据的任何物理访问。取而代之的是,仅授予对模型所有者的安全远程访问,而没有任何能够从数据湖检索数据的能力。但是,模型所有者可能希望从远程存储库定期导出受过训练的模型,并且出现问题是否可能导致数据泄漏。在本文中,我们介绍了神经网络导出期间数据窃取攻击的概念。它包括隐藏出口网络中的一些信息,该信息允许最初存储在该数据湖中的图像数据湖之外的重建。更确切地说,我们表明可以训练可以执行有损耗的图像压缩的网络,同时解决一些实用程序任务,例如图像分割。然后,通过将压缩解码器网络与一些图像代码一起导出,从而导致数据湖外的图像重建。我们探讨了此类攻击对CT和MR图像数据库的可行性,这表明可以获得目标数据集的感知有意义的重建,并且可以随时使用被盗数据集来解决广泛的任务。全面的实验和分析表明,数据窃取攻击应被视为敏感成像数据源的威胁。
translated by 谷歌翻译
Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed. In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression. We build upon the VQ-VAE framework and introduce several modifications. First, we replace the vanilla vector quantizer by a product quantizer. This intermediate solution between vector and scalar quantization allows for a much wider set of rate-distortion points: It implicitly defines high-quality quantizers that would otherwise require intractably large codebooks. Second, inspired by the success of Masked Image Modeling (MIM) in the context of self-supervised learning and generative image models, we propose a novel conditional entropy model which improves entropy coding by modelling the co-dependencies of the quantized latent codes. The resulting PQ-MIM model is surprisingly effective: its compression performance on par with recent hyperprior methods. It also outperforms HiFiC in terms of FID and KID metrics when optimized with perceptual losses (e.g. adversarial). Finally, since PQ-MIM is compatible with image generation frameworks, we show qualitatively that it can operate under a hybrid mode between compression and generation, with no further training or finetuning. As a result, we explore the extreme compression regime where an image is compressed into 200 bytes, i.e., less than a tweet.
translated by 谷歌翻译
We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.
translated by 谷歌翻译
Visual SLAM -- Simultaneous Localization and Mapping -- in dynamic environments typically relies on identifying and masking image features on moving objects to prevent them from negatively affecting performance. Current approaches are suboptimal: they either fail to mask objects when needed or, on the contrary, mask objects needlessly. Thus, we propose a novel SLAM that learns when masking objects improves its performance in dynamic scenarios. Given a method to segment objects and a SLAM, we give the latter the ability of Temporal Masking, i.e., to infer when certain classes of objects should be masked to maximize any given SLAM metric. We do not make any priors on motion: our method learns to mask moving objects by itself. To prevent high annotations costs, we created an automatic annotation method for self-supervised training. We constructed a new dataset, named ConsInv, which includes challenging real-world dynamic sequences respectively indoors and outdoors. Our method reaches the state of the art on the TUM RGB-D dataset and outperforms it on KITTI and ConsInv datasets.
translated by 谷歌翻译
口语医学对话系统越来越引起人们的兴趣,以增强获得医疗服务的机会并提高患者护理的质量和可追溯性。在本文中,我们专注于通过口语对话在智能手机上获得的医疗药物处方。这样的系统将促进护理的可追溯性,并可以释放临床医生的时间。但是,由于大多数相关语料库都是文本形式和英语,因此缺乏语音语料库来开发此类系统。为了促进口头医学对话系统的研究和开发,据我们所知,我们介绍了第一个名为PXSLU的口语医学药物处方语料库。它包含通过与55名参与者专家的实验获得的法国药物处方的4小时和注释对话,并在处方中进行了非专家。我们还提出了一些实验,这些实验证明了该语料库对医学对话系统的评估和开发的兴趣。
translated by 谷歌翻译
零拍学习(ZSL)旨在识别培训时间没有可视化样本的类。要解决此问题,可以依赖每个类的语义描述。典型的ZSL模型学习所看到的类和相应的语义描述的视觉样本之间的映射,以便在测试时间的看不见的类上对此进行操作。最先进的方法依赖于从类的原型合成视觉特征的生成模型,从而可以以监督方式学习分类器。但是,这些方法通常偏向于所看到的类,其视觉实例是唯一可以与给定类原型匹配的类。我们提出了一种正规化方法,可以应用于任何条件生成的ZSL方法,只能利用语义类原型。它学会综合判断特征,以便在训练时间不可用的可能语义描述,即看不见的特征。在文献中常用的四个数据集中评估该方法,其在文献中通常用于感应和转换设置,结果对杠杆或上述现有方法的结果。
translated by 谷歌翻译
我们展示了如何通过基于关注的全球地图扩充任何卷积网络,以实现非本地推理。我们通过基于关注的聚合层替换为单个变压器块的最终平均池,重量贴片如何参与分类决策。我们使用2个参数(宽度和深度)使用简单的补丁卷积网络,使用简单的补丁的卷积网络插入学习的聚合层。与金字塔设计相比,该架构系列在所有层上维护输入补丁分辨率。它在准确性和复杂性之间产生了令人惊讶的竞争权衡,特别是在记忆消耗方面,如我们在各种计算机视觉任务所示:对象分类,图像分割和检测的实验所示。
translated by 谷歌翻译
大规模数据集的预培训模型,如想象成,是计算机视觉中的标准实践。此范例对于具有小型培训套的任务特别有效,其中高容量模型往往会过度装备。在这项工作中,我们考虑一个自我监督的预训练场景,只能利用目标任务数据。我们考虑数据集,如斯坦福汽车,草图或可可,这是比想象成小的数量的顺序。我们的研究表明,在本文中介绍的Beit或诸如Beit或Variant的去噪对预训练数据的类型和大小比通过比较图像嵌入来训练的流行自我监督方法更加强大。我们获得了竞争性能与ImageNet预训练相比,来自不同域的各种分类数据集。在Coco上,当专注于使用Coco Images进行预训练时,检测和实例分割性能超过了可比设置中的监督Imagenet预训练。
translated by 谷歌翻译
根据自我监督的方法,我们根据预先训练的深网络重新审视水印技术。我们提出了一种方法来将标记和二进制消息嵌入到其潜在空间中,利用在标记时间时使用数据增强。我们的方法可以在任何分辨率下运行,并在广泛的转换(旋转,作物,JPEG,对比度等)中创建水印稳健。它显着优于先前的零位方法,其对多比特水印的性能与最先进的编码器 - 解码器架构是对水印的端到端训练的端到端的平台。我们的实施和型号将公开可用。
translated by 谷歌翻译
在十亿缩放的数据集中快速检索类似载体的现代方法依赖于压缩域方法,例如二进制草图或产品量化。这些方法最小化了一定的损失,通常是针对检索问题量身定制的平均平方误差或其他目标函数。在本文中,我们重新解释了流行的方法,例如二进制散列或产品量化器作为自动编码器,并指出它们在解码器的形式上隐式制作次优假设。我们设计了向后兼容的解码器,可从相同的代码改进向量的重建,这转化为最近的邻居搜索中的更好性能。我们的方法显着提高了流行基准的二进制散列方法或产品量化。
translated by 谷歌翻译